---
title: README
emoji: 🏃
colorFrom: gray
colorTo: purple
sdk: static
pinned: false
license: mit
---

# Model Description
BioDistilBERT-uncased is the result of training the [DistilBERT-uncased](https://huggingface.co/distilbert-base-uncased?text=The+goal+of+life+is+%5BMASK%5D.) model in a continual learning fashion for 200k training steps using a total batch size of 192 on the PubMed dataset. 


# Initialisation
We initialise our model with the pre-trained checkpoints of the [DistilBERT-uncased](https://huggingface.co/distilbert-base-uncased?text=The+goal+of+life+is+%5BMASK%5D.) model available on Huggingface.

# Architecture
In this model, the size of the hidden dimension and the embedding layer are both set to 768. The vocabulary size is 30522. The number of transformer layers is 6 and the expansion rate of the feed-forward layer is 4. Overall, this model has around 65 million parameters.

# Citation
If you use this model, please consider citing the following paper:

```bibtex
@article{rohanian2023effectiveness,
  title={On the effectiveness of compact biomedical transformers},
  author={Rohanian, Omid and Nouriborji, Mohammadmahdi and Kouchaki, Samaneh and Clifton, David A},
  journal={Bioinformatics},
  volume={39},
  number={3},
  pages={btad103},
  year={2023},
  publisher={Oxford University Press}
}
```